Computing Highly Specific and Mismatch Tolerant Oligomers Efficiently

نویسندگان

  • Tomoyuki Yamada
  • Shinichi Morishita
چکیده

The sequencing of the genomes of a variety of species and the growing databases containing expressed sequence tags (ESTs) and complementary DNAs (cDNAs) facilitate the design of highly specific oligomers for use as genomic markers, PCR primers, or DNA oligo microarrays. The first step in evaluating the specificity of short oligomers of about twenty units in length is to determine the frequencies at which the oligomers occur. However, for oligomers longer than about fifty units this is not efficient, as they usually have a frequency of only 1. A more suitable procedure is to consider the mismatch tolerance of an oligomer, that is, the minimum number of mismatches that allows a given oligomer to match a sub-sequence other than the target sequence anywhere in the genome or the EST database. However, calculating the exact value of mismatch tolerance is computationally costly and impractical. Therefore, we studied the problem of checking whether an oligomer meets the constraint that its mismatch tolerance is no less than a given threshold. Here, we present an efficient dynamic programming algorithm solution that utilizes suffix and height arrays. We demonstrated the effectiveness of this algorithm by efficiently computing a dense list of oligo-markers applicable to the human genome. Experimental results show that the algorithm runs faster than well-known Abrahamson's algorithm by orders of magnitude and is able to enumerate 63% to approximately 79% of qualified oligomers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Tolerant DNA Computing Based on ‎Digital Microfluidic Biochips

   Historically, DNA molecules have been known as the building blocks of life, later on in 1994, Leonard Adelman introduced a technique to utilize DNA molecules for a new kind of computation. According to the massive parallelism, huge storage capacity and the ability of using the DNA molecules inside the living tissue, this type of computation is applied in many application areas such as me...

متن کامل

Error models for mode-mismatch in linear optics quantum computing

One of the most significant challenges facing the development of linear optics quantum computing (LOQC) is mode-mismatch, whereby photon distinguishability is introduced within circuits, undermining quantum interference effects. We examine the effects of mode-mismatch on the parity (or fusion) gate, the fundamental building block in several recent LOQC schemes. We derive simple error models for...

متن کامل

The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications

Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support ...

متن کامل

DNA mismatch repair in plants. An Arabidopsis thaliana gene that predicts a protein belonging to the MSH2 subfamily of eukaryotic MutS homologs.

Sets of degenerate oligomers corresponding to highly conserved domains of MutS-homolog (MSH) mismatch-repair proteins primed polymerase chain reaction amplification of two Arabidopsis thaliana DNA fragments that are homologous to eukaryotic MSH-like genes. Phylogenetic analysis places one complete gene, designated atMSH2, in the evolutionarily distinct MSH2 subfamily.

متن کامل

Genome-scale design of PCR primers and long oligomers for DNA microarrays.

During the last years, the demand for custom-made cDNA chips/arrays as well as whole genome chips is increasing rapidly. The efficient selection of gene-specific primers/oligomers is of the utmost importance for the successful production of such chips. We developed GenomePRIDE, a highly flexible and scalable software for designing primers/oligomers for large-scale projects. The program is able ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings. IEEE Computer Society Bioinformatics Conference

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2003